** Data reading and variable selection from raw data
** Panel Study of Chinese Family Dynamics (1999 - first wave)


** 01. Reading data **
cap log close
clear all
set more off
cd /*insert you work directory here*/
unicode encoding set Big5 
unicode translate TWN1999.dta,invalid
use TWN1999.dta
save TWN1999.dta, replace

cd /*insert you work directory here*/


** 02. Consructing year and country variables **

ge year=x02
lab var year "survey year"

ge country=158
lab var country "ISO country code"
//Taiwan: 158 (see "ISO Country Codes.pdf) 


** 03. ID variables **

ge pid=x01
lab var pid "person id"


** 04. Basic Demographics (Sex and Age/birth year) **

ge sex=a01
lab var sex "sex"
lab def sex 1 "male" 2 "female"
lab val sex sex

//BC=Taiwan year+1911
ge birthyr=1911+a02
lab var birthyr "year of birth"

ge age=1999-birthyr
lab var age "age"


** 05. Siblings **

ge nsibs=f22z01
lab var nsibs "number of siblings"

ge nbro=f22z02+f22z03
lab var nbro "number of brothers"
ge nsis=f22z04+f22z05
lab var nsis "number of sisters"

ge birthorder=f22z02+f22z04+1
lab var birthorder "birth order"

lab def nsibs 98 "refuse to answer" 196 "no answer" 197 "don't know"
lab val nsibs nbro nsis birthorder nsibs

** 06. Own education **

//highest level of education obtained
ge educ=b01
lab var educ "highest education attained"

lab def educ 1 "no education" 2 "self-taught" 3 "primary school" 4 "junior school_normal" 5 "junior school_technical school" ///
6 "high school_normal" 7 "high school_technical subjects" 8 "high school_technical school" ///
9 "technical college_5 year" 10 "technical college_2 year" 11 "technical college_3 year" 12 "polytech" ///
13 "university" 14 "master" 15 "PhD" 96 "no answer" 97 "cannot remember" 98 "refuse to answer" 99 "missing"

lab val educ educ


** 07. Parents' education: Father and/or Mother **

//highest education obtained
ge faeduc=f03f1
lab var faeduc "father highest education obtained"

ge moeduc=f03m1
lab var moeduc "mother highest education obtained"

lab val faeduc moeduc educ


** 08. Own occupation **
//employment status
ge empstat=c01
lab var empstat "whether the respondent is working"
lab def empstat 1 "yes" 2 "no"
lab val empstat empstat

//industry: area-specific, reference in Chinese
rename c04a01 ind
lab var ind "industry the respondent works"

//occupation: area-specific, reference in Chinese
rename c04a02 occ
lab var occ "occupation the respondent does"

//who do you work for
ge work=c04b
lab var work "who do you work for"

lab def work 0 "not applicable" 1 "self-employed with no other employee" 2 "self-employed with other employees" ///
3 "work for private employers/organisations" 4 "work for public organisations" 5 "work for family with regular pay" ///
6 "work for government sectors" 7 "work for NGOs" 8 "work for family with no pay" 96 "don't know" 97 "other"

lab val work work

//corporation size
ge corpsize=c05
lab var corpsize "number of employees in the company"

lab def corpsize 0 "not applicable" 1 "less than 3" 2 "4-9" 3 "10-29" 4 "30-49" 5 "50-99" 6 "100-499" ///
7 "more than 500" 96 "don't know" 97 "other" 99 "missing"

lab val corpsize corpsize


** 09. Parents' occupation **

//occupation when respondent's 16
rename b09a faocc16 
lab var faocc16 "father's occupation when the respondent was 16"
rename b10a moocc16
lab var moocc16 "mother's occupation when the respondent was 16"

//who did parents work for when respondent's 16
ge fawork16=b09b
lab var fawork16 "who did father work for when the respondent was 16"
ge mowork16=b10b
lab var mowork16 "who did mother work for when the respondent was 16"
lab val fawork16 mowork16 work


** 10. Tabulate the Identified Variables **

log using /*insert you work directory here*/, replace text

** Data reading and variable selection from raw data
** Panel Study of Chinese Family Dynamics (1999 - first wave)

** Sex **
tab sex

** Age, Birth Year **
sum age birthyr, d

** Siblings **
sum nsibs nbro nsis birthorder, d

** R's Own Education **
tab1 educ 

** Parental Education **
tab1 faeduc moeduc 

** R's Own Occupation **
tab1 empstat ind occ work corpsize

** Parental Occupation **
tab1 faocc16 moocc16 fawork16 mowork16


log close

** 11. Keep the identified variables only

keep year country pid sex age birthyr ///
	 nbro nsis nsibs birthorder ///
	 educ faeduc moeduc ///
	 empstat ind occ work corpsize ///
	 faocc16 moocc16 fawork16 mowork16


** 12. Save the Data File **

saveold /*insert you work directory here*/, replace



** 13. Homoginising education **
** Own Education **
rename educ educ_cat

ge educ_yrs=0 if educ_cat==1
replace educ_yrs=0 if educ_cat==2
replace educ_yrs=6 if educ_cat==3
replace educ_yrs=9 if educ_cat==4
replace educ_yrs=9 if educ_cat==5
replace educ_yrs=12 if educ_cat==6
replace educ_yrs=12 if educ_cat==7
replace educ_yrs=12 if educ_cat==8
replace educ_yrs=14 if educ_cat==9
replace educ_yrs=14 if educ_cat==10
replace educ_yrs=15 if educ_cat==11
replace educ_yrs=12 if educ_cat==12
replace educ_yrs=16 if educ_cat==13
replace educ_yrs=19 if educ_cat==14
replace educ_yrs=22 if educ_cat==15
lab var educ_yrs "respondent highest education in years"

ge educ_ISCED=020 if educ_cat==1
replace educ_ISCED=020 if educ_cat==2
replace educ_ISCED=100 if educ_cat==3
replace educ_ISCED=200 if educ_cat==4
replace educ_ISCED=200 if educ_cat==5
replace educ_ISCED=340 if educ_cat==6
replace educ_ISCED=350 if educ_cat==7
replace educ_ISCED=350 if educ_cat==8
replace educ_ISCED=500 if educ_cat==9
replace educ_ISCED=500 if educ_cat==10
replace educ_ISCED=500 if educ_cat==11
replace educ_ISCED=500 if educ_cat==12
replace educ_ISCED=600 if educ_cat==13
replace educ_ISCED=700 if educ_cat==14
replace educ_ISCED=800 if educ_cat==15
lab var educ_ISCED "respondent highest education in years"

** Parents Education **
//father's education is actually father's
ge faeduc_flag=1 

rename faeduc faeduc_cat
rename moeduc maeduc_cat

ge faeduc_yrs=0 if faeduc_cat==1
replace faeduc_yrs=0 if faeduc_cat==2
replace faeduc_yrs=6 if faeduc_cat==3
replace faeduc_yrs=9 if faeduc_cat==4
replace faeduc_yrs=9 if faeduc_cat==5
replace faeduc_yrs=12 if faeduc_cat==6
replace faeduc_yrs=12 if faeduc_cat==7
replace faeduc_yrs=12 if faeduc_cat==8
replace faeduc_yrs=14 if faeduc_cat==9
replace faeduc_yrs=14 if faeduc_cat==10
replace faeduc_yrs=15 if faeduc_cat==11
replace faeduc_yrs=12 if faeduc_cat==12
replace faeduc_yrs=16 if faeduc_cat==13
replace faeduc_yrs=. if faeduc_cat==96 | faeduc_cat==97 | faeduc_cat==98 | faeduc_cat==99
lab var faeduc_yrs "father's education in years"

ge maeduc_yrs=0 if maeduc_cat==1
replace maeduc_yrs=0 if maeduc_cat==2
replace maeduc_yrs=6 if maeduc_cat==3
replace maeduc_yrs=9 if maeduc_cat==4
replace maeduc_yrs=9 if maeduc_cat==5
replace maeduc_yrs=12 if maeduc_cat==6
replace maeduc_yrs=12 if maeduc_cat==7
replace maeduc_yrs=12 if maeduc_cat==8
replace maeduc_yrs=14 if maeduc_cat==9
replace maeduc_yrs=14 if maeduc_cat==10
replace maeduc_yrs=15 if maeduc_cat==11
replace maeduc_yrs=12 if maeduc_cat==12
replace maeduc_yrs=16 if maeduc_cat==13
replace maeduc_yrs=. if maeduc_cat==96 | maeduc_cat==97 | maeduc_cat==98 | maeduc_cat==99
lab var maeduc_yrs "mother's education in years"

ge faeduc_ISCED=020 if faeduc_cat==1
replace faeduc_ISCED=020 if faeduc_cat==2
replace faeduc_ISCED=100 if faeduc_cat==3
replace faeduc_ISCED=200 if faeduc_cat==4
replace faeduc_ISCED=200 if faeduc_cat==5
replace faeduc_ISCED=340 if faeduc_cat==6
replace faeduc_ISCED=350 if faeduc_cat==7
replace faeduc_ISCED=350 if faeduc_cat==8
replace faeduc_ISCED=500 if faeduc_cat==9
replace faeduc_ISCED=500 if faeduc_cat==10
replace faeduc_ISCED=500 if faeduc_cat==11
replace faeduc_ISCED=500 if faeduc_cat==12
replace faeduc_ISCED=600 if faeduc_cat==13
replace faeduc_ISCED=700 if faeduc_cat==14
replace faeduc_ISCED=800 if faeduc_cat==15
lab var faeduc_ISCED "father highest education in years"

ge maeduc_ISCED=020 if maeduc_cat==1
replace maeduc_ISCED=020 if maeduc_cat==2
replace maeduc_ISCED=100 if maeduc_cat==3
replace maeduc_ISCED=200 if maeduc_cat==4
replace maeduc_ISCED=200 if maeduc_cat==5
replace maeduc_ISCED=340 if maeduc_cat==6
replace maeduc_ISCED=350 if maeduc_cat==7
replace maeduc_ISCED=350 if maeduc_cat==8
replace maeduc_ISCED=500 if maeduc_cat==9
replace maeduc_ISCED=500 if maeduc_cat==10
replace maeduc_ISCED=500 if maeduc_cat==11
replace maeduc_ISCED=500 if maeduc_cat==12
replace maeduc_ISCED=600 if maeduc_cat==13
replace maeduc_ISCED=700 if maeduc_cat==14
replace maeduc_ISCED=800 if maeduc_cat==15
lab var maeduc_ISCED "mother highest education in years"


** 14. Homoginising sibling **
//cutoff
ge nbro_flag=99
lab var nbro_flag "cutoff of number of brothers"
ge nsis_flag=99
lab var nsis_flag "cutoff of number of sisters"
ge nsibs_flag=99
lab var nsibs_flag "cutoff of total number of siblings"

lab def nsib_flag 99 "no cutoff"
lab val nbro_flag nsis_flag nsibs_flag nsib_flag

//recode missing
replace nbro=. if nbro==196
replace nsis=. if nsis==196
replace nsibs=. if nsibs==98 

** 15. Tab Education and Sibling Variables **
tab1 sex age birthyr
tab1 educ_cat educ_yrs faeduc_cat faeduc_yrs maeduc_cat maeduc_yrs faeduc_flag 
tab1 educ_ISCED faeduc_ISCED maeduc_ISCED
tab1 nbro nsis nsibs nbro_flag nsis_flag nsibs_flag


** 16. Save the Data File **

saveold /*insert you work directory here*/, replace
